Multilingual Computational Semantic Lexicons in Action: Wysinnwyg Approach to Nlp 1 a Cross-linguistic Investigation on Spatially-based Expressions
نویسنده
چکیده
The Abstract Much effort has been put into computational lexicons over tile years, and most systems give much room to (lexical) semantic data. However, in these systems, the effort put on tile study and representation of lexical items to express the umterlying continuum existing in 1) language vagueness and polysemy, and 2) language gaps and mismatches, has remained embryonic. A sense enumeration approach fails from a theoretical point of view to capture the core meaning of words, let, alone relate word meanings to one another, and complicates the task of NLP by multiplying ambiguities in analysis and choices in generation. In this paper, I study computational semantic lexicon representation from a multilingual point of view, recom:iling different approaches to lexicon representation: i) vagueness for lexemes which have a more or less finer grained semantics with respect to other languages; ii) underspecification for lexemes which have multiple related facets; and, iii) lexical rules to relate systematic polysemy to systematic ambiguity. I build on a What You See Is Not Necessarily What You Get (WYSINNWYG) approach to provide the NLP system with the "right" lexical data already tuned towards a particular task. In order to do so, I argue for a lexical semantic approach to lex~ icon representation. I exemplify my study through a cross-linguistic investigation on spatially-based expressions. In this paper, I argue for computational semantic lexicons as active knowledge sources in order to provide Natural Language Processing (NLP) systeins with the "right" lexical semantic representation to accomplish a particular task. In other words, lexicon entries are "pre-digested", via a lex-ieal processor, to best fit an NLP task. This What You See (in your lexicon) Is Not Necessarily What You Get (as input to your program) (WYSIN-NWYG) approach requires the adoption of a symbolic paradigm. Formally, I use a combination of three different approaches to lexicon repre, sen-tations: (1) lexico-semantic vagueness, for lexemes which have a more or less fner graine, d semantics with re, spect to other languages (for instance cn in Spanish is vague between the Contact and Container senses of the Location, whereas in English it is finer grained, with on for the former and in for the latter); (2) lexico-semantic underspecification, for lex-emes which have multiple related facets (such as for instance, door which is underspecified with respect to its Aperture or PhysicalObjeet meanings); and, (3) lcxical rules, to relate systematic polysemy to systematic ambiguity (such as …
منابع مشابه
Multilingual Computational Semantic Lexicons in Action: The WYSINNWYG Approach to NLP
Much effort has been put into computational lexicons over the years, and most systems give much room to (lexical) semantic data. However, in these systems, the effort put on the study and representation of lexical items to express the underlying continuum existing in 1) language vagueness and polysemy, and 2) language gaps and mismatches, has remained embryonic. A sense enumeration approach fai...
متن کاملBoosting Lexical Resources for the Semantic Web: Generative Lexicon and Lexicon Interoperability
Computational lexicons can play a key role in the Semantic Web: aiming at making word content machine-understandable, they intend to provide an explicit representation of word meaning, so that it can be directly accessed and used by computational agents, such as large-coverage parsers, modules for intelligent Information Retrieval or Information Extraction. In all these cases, semantic informat...
متن کاملStandards & best practice for multilingual computational lexicons: ISLE MILE and more
ISLE (International Standards for Language Engineering) is a transatlantic standards oriented initiative under the Human Language Technology (HLT) programme within the EU-US International Research Co-operation. It is a continuation of the European EAGLES (Expert Advisory Group for Language Engineering Standards) initiative, carried out through a number of subsequent projects funded by the Europ...
متن کاملA New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملDevelopment of the Multilingual Semantic Annotation System
This paper reports on our research to generate multilingual semantic lexical resources and develop multilingual semantic annotation software, which assigns each word in running text to a semantic category based on a lexical semantic classification scheme. Such tools have an important role in developing intelligent multilingual NLP, text mining and ICT systems. In this work, we aim to extend an ...
متن کامل